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ABSTRACT 



A facility is provided for implementing a content-based 
publish-subscribe system using a group -based multicast. 
The facility includes mapping all possible groups of the 
publish-subscribe system to a smaller number of multicast 
groups, wherein the smaller number of multicast groups 
include brokers and the brokers have consumers. The map- 
ping includes clustering the brokers of the publish-subscribe 
system into C clusters of multicast groups, wherein each 
cluster of the C clusters has its own subset of multicast 
groups, and wherein C>1. The clustered multicast groups are 
then used to forward an event to consumers within the 
content-based publish-subscribe system by multicasting the 
event up to C times, each multicasting being to a different 
cluster of the C clusters. 

36 Claims, 8 Drawing Sheets 
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Nc = TOTAL NUMBER OF BROKERS/ 
DESIRED NUMBER OF CLUSTERS 




<4 
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CHOOSE A RANDOM UNASSIGNED BROKER 
B AND ADD IT TO THE CURRENT CLUSTER 



200 



220-^ 



ADD AN UNASSIGNED BROKER 
THAT IS NEAREST TO B IN TERMS 
OF THE LATENCY 



230- 



NUMBER 
OF BROKERS IN CURRENT 
CLUSTER = Nc 



fig. 2 
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REPEAT UNTIL ALL 
BROKERS ARE 
EXHAUSTED 



05/05/2004, EAST version: 1.4.1 



U.S. Patent 



Jan. 1, 2002 Sheet 3 of 8 



US 6,336,119 Bl 



FOR EACH BROKER OF A CLUSTER ^T^^OO 



LET Ci = CLUSTER NUMBER OF THIS 
BROKER (IN BINARY), Bi = BROKER 
NUMBER OF THIS BROKER WITHIN Ci 



I 



JOIN ALL MULTICAST ADDRESSES 
OF THE FORM <Ci><Ri>. 
WHERE Ri HAS A "1" IN POSITION Bi 



fig. 3A 



FOR EACH BROKER 



I 



LET Ci = CLUSTER NUMBER OF THIS 
BROKER (IN BINARY). Bi = BROKER 
NUMBER OF THIS BROKER WITHIN Ci 



I 



y^3G0 



JOIN ALL MULTICAST ADDRESSES OF 
THE FORM<Ci><Ri>, WHERE Ri HAS A "1* 
IN POSITION Bi AND Ri HAS NO MORE 
THAN THRESHOLD NUMBER OF "1"s. 

AS WELL AS THE SPECIAL 
"CLUSTER BROADCAST" ADDRESS 



fig. 3B 
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PUBLISHER MATCHING TREE 




ATTRIB A 
ATTRIB B 
ATTRIB C 



SI S2 S3 S4 S5 S6 

\ / 

SUBSCRIPTIONS. ANNOTATED WITH 2D BIT 
VECTORS OF C ROWS AND K COLUMNS 

(C = NUMBER OF CLUSTERS. 
K = NUMBER OF BROKERS WITHIN CLUSTER) 

fig. 4 A 
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SUBSCRIBER MATCHING TREE 




ATTRIB A 



ATTRIB B 



3 ATTRIB C 



SUBSCRIPTIONS, ANNOTATED WITH CLIENT IDS 



fig. 
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A NEW SUBSCRIPTION IS MADE 






AT SUBSCRIBING BROKER, BROADCAST 
TO ALL PUBLISHING BROKERS: 
<SUBCRIPTION, CLUSTER ID, 
BROKER ID WITHIN CLUSTER> 










r 


^520 


AT EACH PUBLISHING BROKER: 
- FIND THE SUBSCRIPTION Si IN MATCHING TREE 
- IN Si's ANNOTATION, TURN ON THE BIT IN 
CLUSTER ID ROW, BROKER ID COLUMN 
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AN EVENT IS PUBLISHED 



600 



60S 



AT THE PUBLISHING BROKER: 
CREATE A 2D BIT-VECTOR "MASK" OF ALL ZEROS 
(ROWS = NUMBER OF CLUSTERS. 
COLUMNS = NUMBER OF BROKERS WITHIN A CLUSTER) 



1 



TRAVERSE THE MATCHING TREE 



610 



615 



FOR EACH LEAF NODE VISITED. PERFORM AN 
"OR" OPERATION BETWEEN THE MASK AND 
THE ANNOTATION AT THE LEAF NODE 



620 



AT THE END OF TRAVERSAL, MULTICAST 
THE EVENT ONCE TO EACH CLUSTER 
HAVNG AT LEAST ONE T IN A COR- 
RESPONDING ROW OF THE RESULT MASK. 
THE MULTICAST ADDRESSES ARE 
DETERMINED BY APPENDING EACH 
ROW IN THE RESULT MASK TO THE 
CORRESPONDING CLUSTER ID (IN BINARY) 



1 



625 



AT THE SUBSCRIBING BROKER, TRAVERSE 
THE MATCHING TREE AND SEND TO EACH 
SUBSCRIBING CLIENT 
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AN EVENT IS PUBLISHED 



1 



650 



655 



TRAVERSE THE MATCHING TREE AS 
IN THE CGM ALGORITHM 



660 



AT THE END OF TRAVERSAL, MULTICAST THE 
EVENT ONCE TO EACH CLUSTER, WHERE 
THE MULTICAST ADDRESSES ARE 
DETERMINED AS FOLLOWS: 



665 



FOR CLUSTER ID Ci. IF THE CORRESPONDING 
ROW IN THE RESULT MASK HAS NO MORE 
THEN THRESHOLD 'Ts. THE MULTICAST 

ADDRESS IS Ci APPENDED WITH 
THE ROWS VALUE 



670 



FOR CLUSTER ID Ci, IF THE CORRESPONDING 
ROW IN THE RESULT MASK HAS MORE THAN 

THRESHOLD "fs. THE MULTICAST ADDRESS 
IS Ci's SPECIAL "CLUSTER BROADCAST" ADDRESS 



675 



AT THE SUBSCRIBING BROKER. TRAVERSE THE 
MATCHING TREE AND SEND TO EACH SUBSCRIBING 
CLIENT AND DISCARD EVENT IF THERE 
ARE NO SUBSCRIBERS 



fig. 6B 
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METHOD AND SYSTEM FOR APPLYING 
CLUSTER-BASED GROUP MULTICAST TO 
CONTENT-BASED PUBLISH-SUBSCRIBE 
SYSTEM 

CROSS-REFERENCE TO RELATED 
APPLICATIONS 

This application is a division of U.S. application Ser. No. 
08/975,280, filed Nov. 20, 1997 which is a division of U.S. 
application Ser. No. 08/975,303, filed Nov. 20, 1997. 

lliis application contains subject matter which is related 
to the subject matter of the following applications, each of 
which is assigned to the same assignee as this application. 
Each of the below listed applications is hereby incorporated 
herein by reference in its entirety: 

"Method And System For Matching Consumers To 
Events," Astley et al., Ser. No. 08/975,280, filed Nov. 20, 
1997, now U.S. Pat. No. 6,216,132; 

"Routing Messages Within A Network Using The Data 
Content Of The Message," Chandra et al., Ser. No. 08/975, 
303, filed Nov. 20, 1997, now U.S. Pat. No. 6,091,724; and 

"Method And System For Matching Consumers To 
Events Employing Content-Based Multicast Routing Using 
Approximate Groups," Astley et al.. Sen No. 09/538,471, 
CO- filed herewith, now pending. 

TECHNICAL HELD 

This invention relates, in general, to event computing 
systems and, more particularly, to a content-based multicast 
routing technique which delivers events to consumers of an 
event computing system interested in a particular set of 
events. 

BACKGROUND ART 

A common practice for integrating autonomous compo- 
nents within a computing system has been to utiUze events. 
Events are, for example, data generated by a provider and 
delivered through a communication medium, such as a 
computer network, hard disk, or random access memory, to 
a set of interested consumers. The providers and consumers 
need not know one another's identity, since delivery is 
provided through intermediary software. This independence 
between provider and consumer is known as decoupling. 

One example of an event computing system is a database 
event system. Modern database systems include support for 
event triggers. Event triggers associate a filter, which is a 
predicate that selects a subset of events and excludes the 
rest, with an action to take in response to events on the 
database. An event on a database is any change to the state 
of the database. 

In database event systems, gating tests have been used to 
determine which consumers of a system are interested in a 
particular event. That is, gating tests have been used to 
match filters in event triggers to events. As described in "A 
Predicate Matching Algorithm for Database Rule Systems," 
by Hanson et al. Proceedings of SIGMOD (1991), pp. 
271-280, gating tests identify a single predicate for each 
filler as primary, and tests are organized in a data structure 
based on this primary predicate. Additionally, the data needs 
to be organized based on the primary predicate. 

Another example of an event computing system is a 
distributed event system, also known as a publish/subscribe 
system. A publish/subscribe system is a mechanism where 
subscribers express interest in future information by some 
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selection criterion, publishers provide information, and the 
mechanism delivers the information to all interested sub- 
scribers. Current publish/subscribe systems organize infor- 
mation around subjects (also called chaimels or streams). 
5 Providers or publishers publish events to groups and con- 
sumers or subscribers subscribe to all data from a particular 
group. 

One example of a publish/subscribe system is described in 
detail in U.S. Pat. No. 5,557,798, issued to Skeen et al. on 

10 Sep. 17, 1996, and entitled "Apparatus And Method For 
Providing Decoupling Of Data Exchange Details For Pro- 
viding High Performance Communication Between Soft- 
ware Processes", which is hereby incorporated herein by 
reference in its entirety. In U.S. Pat. No. 5,557,798, the 

15 publisher of an event annotes each message with an identi- 
fier called a subject and a subscriber subscribes to a par- 
ticular subject. Thus, if a subscriber is interested in just a 
portion of the events having a given subject, it would have 
to receive the entire subject and then discard the unwanted 

20 information. 

Based on the foregoing, a need exists for a matching 
capabihty that does not require the partitioning of data into 
subjects. A further need exists for a matching capability that 
enables a consumer to use any filtering criterion expressible 
with the available predicates. Additionally, a need exists for 
a mechanism that allows a consumer to receive only the 
information that it desires, such that the filtering is done 
independent of the consumer. 

30 SUMMARY OF THE INVENTION 

One approach to addressing the above-noted needs is 
described in the above-incorporated, co-pending U.S. patent 
apphcation Ser. No. 08/975,280, entitled "Method and Sys- 

35 tem for Matching Consumers to Events." In this approach, 
referred to herein as a content-based event computing 
system, the matching facility includes a search data structure 
(e.g., a search tree or search graph), which is used to 
determine the consumers interest in a particular event. 

4Q Content-based subscription is the ability of subscribers to 
specify interest in events based on operations limited only 
by the structure of the events and the operation supported by 
the pattern language. 

Applicants have identified a problem arising with contenl- 

45 based subscription which arises when using group based 
multicast such as internet protocol (IP) multicasting of an 
event. In a practical content-based subscription system, there 
will typically be too many groups of clients or consumers to 
use a multicast facility. 

50 As one example, the environment of this invention may 
include content-based, publish/subscribe systems deployed 
over IP networks such as the Internet. Clients are either 
publishers or subscribers, and are attached to machines 
referred to herein as brokers. The publisher's broker receives 

55 a published message (also referred to herein as an "event") 
and delivers it to subscriber brokers at least one of whose 
attached clients has a subscription matched by the message. 
These subscriber brokers then forward the message to the at 
least one attached client. Content-based systems are more 

60 flexible and provide more selectivity than subject-based 
systems. However, the multicast problem for content-based 
message delivery middleware is more complex than for 
subject-based delivery once the number of destinations for 
messages becomes large. It may no longer be straightfor- 

65 ward or eflBcient to use IP multicast groups to distribute 
messages of a content-based system over a network because 
the number of such groups required grows rapidly with the 
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number of subscriptions. This number eventually becomes 
so large that either the supported range of multicast 
addresses is exceeded or the overhead of setting up and 
listening to such a large number of multicast addresses 
becomes excessive. The present invention addresses this 
problem. 

To summarize, provided herein is of a method for imple- 
menting a content-based publish -subscribe system using a 
group-based multicast. The method includes: mapping pos- 
sible groups of the publish-subscribe system to a smaller 
number of multicast groups, wherein the smaller number of 
multicast groups includes brokers, and the brokers have 
consumers; and using the smaller number of multicast 
groups to forward an event to consumers within the content - 
based publish-subscribe system. 

In another aspect, the present invention includes at least 
one program storage device readable by a machine, tangibly 
embodying at least one program of instructions executable 
by the machine to perform a method of implementing a 
content-based publish-subscribe system using a group-based 
multicast. The method includes: mapping possible groups of 
the publish-subscribe system to a smaller number of multi- 
cast groups, wherein the smaller number of multicast groups 
comprise brokers, the brokers having consumers; and using 
the smaller number of multicast groups to fonvard an event 
to consumers within the content-based publish-subscribe 
system. 

In a fiirther aspect, a system for implementing a content- 
based publish-subscribe system using a group-based multi- 
cast is provided. The system includes means for mapping 
possible groups of the publish-subscribe system to a smaller 
number of multicast groups, wherein the smaller number of 
multicast groups comprise brokers, and the brokers have 
consumers. The system further includes means for using the 
smaller number of multicast groups to forward an event to 
consumers within the content-based publish-subscribe sys- 
tem. 

To restate, the present invention applies clustering to 
group multicast-based implementations of a content-based 
publish-subscribe system. Furthermore, as an enhancement, 
the invention employs thresholding to further reduce the 
number of groups required, lliese processes, referred to 
herein as cluster group multicast (CGM), provide muUiple 
advantages over existing art. For example, under conditions 
of high match rale (i.e., very few subscribers are interested 
in any given event) and high regionalism (i.e., subscribers 
interested in an event are geographically co-located), CGM 
is superior to flooding (described herein below). In addition, 
when the cost of fringe -links (i.e., links connecting brokers 
to the network) is highest, CGM is superior to other group 
multicast techniques. Advantageously, group assignments in 
CGM are static and can be created independent of the 
subscriptions. Furthermore, it is possible to apply CGM to 
reasonably sized broker networks in Internet protocol (IP) 
version 4 and version 6. 

Additional features and advantages are realized through 
the techniques of the present invention. Other embodiments 
and aspects of the invention are described in detail herein 
and are considered part of the claimed invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The above-described objects, advantages and features of 
the present invention, as well as others, will be more readily 
understood from the following detailed description of cer- 
tain preferred embodiments of the invention, when consid- 
ered in conjunction with the accompanying drawings in 
which: 
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FIG. 1 depicts a sample environment for deployment of 
one embodiment of a cluster group multicast (CGM) facility 
in accordance with the present invention; 

FIG. 2 is a flowchart of one embodiment of a clustering 
5 process for a CGM facility in accordance with the present 
invention; 

FIG. 3A is a flowchart of one process embodiment for 
configuring brokers for cluster group multicast in accor- 
dance with the principles of the present invention; 

FIG. 3B is a flowchart of an alternate embodiment for 
configuring brokers for cluster group multicast in accor- 
dance with the principles of the present invention, wherein 
a threshold number is employed to signal use of a special 
cluster broadcast address; 

FIG. 4A depicts a publisher matching data structure for 
use in event matching using cluster group multicast in 
accordance with the present invention; 

FIG. 4B is a subscriber matching data structure for use in 
20 event matching using cluster group multicast in accordance 
with the present invention; 

FIG. 5 is a flowchart of one embodiment of a subscription 
propagation process in accordance with the principles of the 
present invention; 
25 FIG. 6A is a flowchart of one embodiment for matching 
an event to a subscribing client using cluster group matching 
(CGM) in accordance with the principles of the present 
invention; and 

FIG, 6B is a flowchart of one embodiment for matching 
■^^ an event to a subscribing client using threshold cluster group 
matching (TCGM) in accordance with the principles of the 
present invention. 

BEST MODE FOR CARRYING OUT THE 
35 INVENTION 

As briefly noted above, a content-based publLsh-subscribe 
system disseminates information in the form of "events" 
from those producing information (publishers), to interested 
parties (subscribers). The advantage of a content-based 
publish-subscribe system is that subscribers receive only 
those events for which they have expressed an interest. One 
way of expressing information is by specifying a predicate 
over an event schema. 

As shown in FIG. 1, a scalable publish-subscribe system, 
generally denoted 100, is commonly realized on networks of 
"broker** nodes, connected by "router** nodes 110. Subscrib- 
ing clients connect to subscribing brokers 120 and register 
their interest in particular types of events. Likewise, pub- 
lishers connect to publishing broker nodes 130, and publish 
events. The brokers are responsible for tracking subscrip- 
tions and for routing events from publishers to the appro- 
priate set of subscribers. 

Implementations of publish-subscribe systems on today's 
55 network infrastructure are typically based on either match- 
ing near the subscriber or matching near the publisher. 

Matching near the subscriber is called "flooding*'. In the 
flooding approach, every event is broadcast to all brokers, 
which in turn match the event and deliver it to their 
60 subscribers. This approach is ineflScient since brokers that 
may not eventually need the message stiU receive the event 
and thus consume valuable network bandwidth. The situa- 
tion is particularly bad when subscriptions are regional in 
nature. 

65 When an event is matched near the publisher, it can be 
routed to the right set of subscribing brokers by using (1) 
point-to-point broker connections, (2) destination lists, or (3) 
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group multicast between brokers. Point-to-point routing is approach performs well in the case when subscribers that are 

ineflBcient because it fails to take advantage of the common geographically co-located also have similar subscriptions, 

paths between brokers. Routing via destination lists is not i.e., there is "regionalism" of subscriptions, 
scalable since message headers, and thus the message itself The number of groups required by the CGM process may 
can grow quickly. Moreover, routing via destination lists is 5 be reduced further by reducing the precision of the algo- 

pot directly supported by routers in current day network rithm. One approach to reducing the precision is to flood a 

infrastructure. cluster when more than a threshold number of brokers 

That leaves the approach of routing via group multicast, within that cluster need to receive an event. That is, the 

to which the present invention is directed. Straightforward process behaves like CGM unless the number of destinations 
implementations using group multicast -based routing, such 10 in a cluster exceeds a threshold, at which point the event is 

as an internet protocol (IP) multicast, produce systems that multicast to the entire cluster. This algorithm is referred to 

are not scalable in the number of groups required. For herein as threshold CGM (or TCGM). 
example, a system with N endpoinls requires 2^ groups. por each cluster, we pick a threshold T, with T<K, where 

Using the currently deployed version of IP, this means that k is the size of the cluster. If an event matches more than T 
systems may not have more than 24 endpoints, since the 15 gndpoints, the event is sent to all brokers in the cluster, 

number of groups supported in the best case is 2^^, Otherwise, the event is sent only to the brokers subscribed 

Thus, this invention is concerned with the problem of to the event (as in CGM). This process requires multicast 

efiScicntly matching an event against a set of subscriptions groups for all subsets of brokers in a cluster of size T or 

and routing it from its publisher to subscribers within a smaller, plus one additional multicast group for an "all 

network of publish-subscribe brokers. Furthermore, this brokers" multicast in the cluster. The group requirement for 

invention is concerned with solving the problem within the TCGM can be many orders of magnitude smaller than in the 

context of existing internet infrastructure and under case of CGM. 

expected distributions of subscriptions. One such expected As noted, in accordance with the present invention a 

subscriber distribution is called a "regional distribution" in broker network, for example, network 100 of FIG. 1, needs 

which subscribers with similar interest are located in the to be divided into a number of clusters, C. FIG. 2 depicts one 

same geographical region of the network, and furthermore, embodiment of a clustering process for achieving this. In 

publishers that produce events of interest may lay within the this process, the variable Nc is defined as the total number 

same region as weU. of brokers divided by the desired number of clusters 200. 

Qustering is an existing technique for reducing the num- Once Nc is defined, the process begins by choosing a 

ber of groups required for message distribution. Iliis inven- random unassigned broker in the network 210. An unas- 

tion applies the clustering technique to group multicast signed broker that is nearest to broker B in terms of latency 

based implementations of a content -based publish-subscribe is selected for inclusion in the current cluster 220. This 

system. Furthermore, this invention applies the technique of process continues until the number of brokers in the current 

thresholding (described later) to further reduce the number cluster equals Nc 230, As the process proceeds, each visited 

of groups required. broker is marked as assigned and the process continues until 

The CGM process described herein is based on the use of all C clusters are defined and all brokers are assigned to a 

clusters: mutually exclusive subsets of brokers where each cluster 240. Note that the process of FIG. 2 for aeating 

subset has its own set of multicast groups. We observe that clusters is provided by way of example only, i.e., several 

if we divide N endpoints into 2 clusters, we reduce the alternative processes are also possible, 
number of groups in each cluster to 2^^ groups, and the total Before the publish-subscribe system can begin accepting 

number of groups to two times that. The cost of this subscriptions and published events, the system must be 

approach, however, is that it may be necessary to multicast configured. Note that this configuration occurs once and is 

an event twice: once to a group in each cluster. In general, static, i.e., unless the network itself changes. For the clus- 
if we divide N into C clusters, the total number of groups 45 tered group multicast (CGM) process presented herein, this 

needed is given by g=C(2^'*^. So, for example, if we have configuration can be performed using the process of FIG. 

2^^ multicast groups available, we can support 80 broker 3A. Initially, each broker 300 is assumed to be associated 

endpoints by dividing them into 8 clusters of 10 brokers with a cluster Ci, and assigned a broker number Bi within the 

each. Since the groups within a cluster enumerate all pos- cluster Ci. The cluster number for the broker is expressed in 
sible combinations of brokers, each broker must join half jq binary, while Bi is the broker number for the broker within 

these groups (those that include the broker) at system Ci 310. Multicast addresses are normally represented as 

configuration time. binary strings. The group multicast addresses that a broker 

Each broker contains an instance of the subscription must join have the cluster number Ci (in binary) as their high 

matching engine with entries for all client subscriptions, and order bits and a "1" in the position corresponding to the 
sorts the resulting list of brokers by cluster. It then looks up 55 broker number Bi 320. Essentially, this process determines 

the group in each cluster that contains, for example, exactly each individual subset or group of the total number of 

those brokers destined to receive the event. The publisher's brokers in the cluster, represents each broker as a single bit, 

broker then performs up to C multicasts, where C is the and makes each broker join all of the groups which have 

number of clusters. Some clusters may have no matching addresses with "1" in the position of that broker number 
brokers and are therefore skipped. go within the cluster. 

The choice of cluster assignment has a significant impact As an alternate embodiment, FIG. 3B depicts a process 

on performance. For example, brokers that match a single for configuring brokers when a threshold cluster group 

subscription, but that are spread over multiple clusters will multicast (TCGM) is to be employed. Again, each broker 

require multiple multicasts. One approach to clustering is to 350 is assumed to be associated with a cluster Ci, and is 
build clusters by grouping brokers with similar subscription 65 assigned a broker number Bi within the cluster Ci 360. For 

sets. Another approach is to use geographic (or network) this process, the multicast addresses to join are determined 

location data to group brokers into clusters. This latter by selecting all addresses that have a "1" in the position Bi, 
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the broker number, and further, by selecting only those 
addresses that have no more Is than a threshold number. 
Each broker in a cluster also joins a special group address, 
referred to herein as the "all brokers" or "cluster broadcast" 
address 370. 5 

The subscription propagation and event matching and 
routing processes in accordance with this invention are 
written based on the data structures that are maintained by 
each broker. Exemplary embodiments of these data struc- 
tures are depicted in FIGS. 4A & 4B, wherein a publisher -jq 
matching tree and a subscriber matching tree are shown, 
respectively. Each publishing broker maintains a matching 
tree of all subscriptions in which each leaf is annotated with 
a bit vector as shown in FIG. 4 A. The bit vector annotation 
has C rows and a number of columns K equal to the number 
of brokers in each cluster. The shown in these figures 
comprises a "don*t care" meaning that the path is traversed 
irrespective of the attribute value, Don*t cares are described 
in greater detail in the initially incorporated United States 
Patent application entitled "Method and System for Match- 20 
ing Consumers to Events." The trees are traversed by 
comparing attribute values with values of the published 
event. As shown in FIG. 4B, the subscribing brokers contain 
a matching data structure used to match events to those 
subscriptions that are applicable to the clients of that broker. 25 
As shown, leaf nodes in this tree are annotated with iden- 
tifiers of all the clients that match a particular subscription. 

FIG. 5 depicts one embodiment for propagating new 
subscriptions throughout the system. Once a new subscrip- 
tion is made 500, the subscription itself is broadcast by the 30 
receiving broker, along with a cluster id of the subscribing 
broker and the broker id of the subscribing broker within the 
cluster 510. At each publishing broker, the broker finds the 
subscription Si in the pubhsher matching data structure 
(FIG. 4A), and in the Si's annotation, the publishing broker 35 
turns on the corresponding bit in the cluster id row, broker 
id column 520. 

FIG. 6A depicts one embodiment of cluster group multi- 
cast (CGM) in accordance with the principles of the present 
invention. Once an event is published 600, then at the 40 
publishing broker, a 2D bit-vector "mask" of all zeros is 
created, where the number of rows is the same as the number 
of clusters, and the number of columns equals the number of 
brokers within a cluster 605. The publisher matching tree 
(FIG. 4A) is then traversed 610, and for each leaf node 45 
vLsited, an OR operation is performed between the mask and 
the annotation of the leaf node 615. At the end of this 
traversal, the event is multicast once to each cluster con- 
taining at least one interested subscriber 620. The multicast 
addresses are determined by appending each nonzero row in 50 
the result mask to the corresponding cluster id (in binary). 
The group multicast infrastructure that exists within the 
routers will then deliver the event to the subscribing brokers 
in the clusters. At each subscribing broker, the subscriber 
matching tree (FIG. 4B) is traversed and the event is sent to ss 
each subscribing client 625. 

FIG. 6B depicts an alternate cluster group matching 
process, herein referred to as the threshold cluster group 
multicast (TCGM) process which employs the above- 
discussed thresholding technique. An event is again pub- 60 
lished 650 and the publisher matching tree is traversed 655 
as described above in connection with the CGM process of 
FIG. 6A. At the end of this traversal, the event is multicast 
once to each cluster, where the multicast addresses are 
determined as follows 660: for each cluster id Ci, if the 65 
corresponding row in the result mask has a nonzero number 
of "Is", but no more than a "threshold", the multicast 



address is Ci appended with the row's value 665, For cluster 
id Ci, if the corresponding row in the result mask has more 
than the threshold of Is, the muhicast address is Ci's special 
"cluster broadcast" address 670. At the subscribing broker, 
the subscriber matching tree is again traversed and the event 
is sent to each subscribing client 675. If there are no 
subscribing clients, then the event is discarded. 

Note that the present invention can be included, for 
example, in an article of manufacture (e.g., one or more 
computer program products) having, for instance, computer 
usable media. This media has embodied therein, for 
instance, computer readable program code means for pro- 
viding and facilitating the capabilities of the present inven- 
tion. The articles of manufacture can be included as part of 
the computer system or sold separately. 

Additionally, at least one program storage device readable 
by machine, tangibly embodying at least one program of 
instructions executable by the machine, to perform the 
capabilities of the present invention, can be provided. 

The flow diagrams depicted herein are provided by way of 
example. There may be variations to these diagrams or the 
steps (or operations) described herein without departing 
from the spirit of the invention. For instance, in certain 
cases, the steps may be performed in differing order, or steps 
may be added, deleted or modified. All of these variations 
are considered to comprise part of the present invention as 
recited in the appended claims. 

While the invention has been described in detail herein in 
accordance with certain preferred embodiments thereof, 
many modifications and changes therein may be effected by 
those skilled in the art. Accordingly, it is intended by the 
appended claims to cover all such modifications and changes 
as fall within the true spirit and scope of the invention. 

What is claimed is: 

1. A method of implementing a content -based publish- 
subscribe system using a group-based multicast, said 
method comprising: 

mapping possible groups of the content-based publish - 
subscribe system to a smaller number of multicast 
groups, wherein said smaller number of multicast 
groups comprise brokers, said brokers having consum- 
ers; 

using the smaller number of multicast groups to forward 
an event to interested consumers within the content- 
based publish -subscribe system; 

wherein said mapping comprises clustering brokers of the 
published-subscribe system into C clusters, wherein 
each cluster of said C clusters has its own subset of 
multicast groups, and wherein C>1; and 

wherein said using comprises multicasting the event to 
interested consumers using the smaller number of 
groups, and wherein said multicasting comprises mul- 
ticasting the event up to C times, each multicasting 
being to interested consumers within a different cluster 
of said C clusters. 

2. The method of claim 1, wherein said clustering com- 
prises grouping brokers within said C clusters using geo- 
graphic proximity of brokers within said publish-subscribe 
system. 

3. The method of claim 1, wherein said using comprises: 
matching the event against all subscriptions of the pubhsh- 
subscribe system; sorting a resulting list of brokers having 
subscriptions for the event by cluster; thereafter, ascertain- 
ing the multicast group in each cluster that contains those 
brokers destined to receive the event; and performing up to 
C multicasts of the event to those multicast groups of the C 
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clusters, wherein each multicast goes to a different cluster of threshold T of "T's, its multicast address is Ci appended with 

said C clusters. the row's value; and for cluster id Ci, if a corresponding row 

4. The method of claim 3, wherein said ascertaining the in the result mask has more than the threshold T of "l"s, the 
multicast group comprises ascertaining the multicast group multicast address is a special "cluster broadcast" address, 
ofeach cluster that contains precisely those brokers destined 5 12. The method of claim 11, further comprising at the 
to receive the event. subscribing broker, discarding the event if there are no 

5. Th& method of claim 3, wherein said ascertaining the subscribers to the event. 

multicast group comprises ascertaining the multicast group 13. At least one program storage device readable by a 

of each cluster that approximately contains those brokers machine, tangibly embodying at least one program of 
destined to receive the event, wherein the ascertaining is ^0 instructions executable by the machine to peiform a method 

approximate in that one multicast group in each cluster of implementing a content-based publish-subscribe system 

comprises an "all broker" multicast group, and said per- using a group-based multicast, said method comprising: 

forming up to C multicasts comprises, for a multicast to a mapping possible groups of the content-based publish- 

particular cluster, usmg the "all broker" multicast group if subscribe system to a smaller number of multicast 

the event is to be published to more than a threshold number groups, wherein said smaller number of multicast 

T of brokers in the cluster, wherein r< a total number of groups comprise brokers, said brokers having consum- 

brokers in the cluster. ers; 

6. The method of claim 1, wherein said mapping com- using the smaller number of multicast groups to forward 
prises clustering all brokers of the publish-subscribe system 20 ^° interested consumers within the content- 
into C clusters, each cluster comprising a mutually exclusive based publish-subscribe system; 

subset of brokers of the publish-subscribe system. wherein said mapping comprises clustering brokers of the 

7. The method of claim 6, wherein said clustering com- publish-subscribe system into C clusters, wherein each 
prises choosing a broker of the publish-subscribe system that ""^rJ^rfn^lherS^^^ ^""^^^ of multicast 
has not already been allocated to a cluster and building a 25 groups, an w erein > , an 

,1 , ,1 wherein said usmg comprises multicastmg the event to 

cluster by using latency between brokers to group brokers • * . j • *u n u r 

^ r interested consumers usmg the smaller number of 

wi in e c us er. groups, and wherein said multicasting comprises mul- 

8. The method of claim 7, wherein said mapping further ticasting the event up to C times, each multicasting 
comprises for each cluster, assigning brokers to each indi- being to interested consumers within a different cluster 
vidual broker subset derived from a total number of brokers of said C clusters. 

in the cluster, wherein one multicast group is assigned to 14. The at least one program storage device of claim 13, 

each such broker subset, and wherein the brokers in each wherein said clustering comprises grouping brokers within 

subset join the multicast group assigned to that subset. said C clusters using geographic proximity of brokers within 

9. The method of claim 6, further comprising providing a 35 said publish-subscribe system. 

publisher matching tree on each publishing broker of the 15. The at least one program storage device of claim 13, 

publish-subscribe system, and providing a subscriber match- wherein said using comprises: matching the event against all 

ing tree on each subscribing broker of the publish-subscribe subscriptions of the pubhsh -subscribe system; sorting a 

system, wherein the publisher matching tree contains sub- resulting list of brokers having subscriptions for the event by 

scriptions annotated with a 2D bit vector of C rows and K cluster; thereafter, ascertaining the multicast group in each 

columns, wherein C equals a number of clusters, and K cluster that contains those brokers destined to receive the 

equals a number of brokers within the particular cluster, and event; and performing up to C multicasts of the event to 

wherein said subscriber matching tree comprises subscrip- those multicast groups of the C clusters, wherein each 
tions annotated with consumer ids. 45 multicast goes to a different cluster of said C clusters. 

10. The method of claim 9, wherein said using comprises 16. The at least one program storage device of claim 15, 
at a publishing broker, creating a 2 bit-vector "mask** of all wherein said ascertaining the multicast group comprises 
zeros, wherein a number of rows of the "mask" equals a ascertaining the multicast group of each cluster that contains 
number of clusters and a number of columns of the "mask" precisely those brokers destined to receive the event, 
equals a number of brokers within a cluster, traversing the 17. The at least one program storage device of claim 15, 
publisher matching tree, and for each leaf node visited, wherein said ascertaining the multicast group comprises 
performing an "OR" operation between the "mask" and the ascertaining the multicast group of each cluster that approxi- 
2D bit vector annotation at the leaf node, and upon comple- mately contains those brokers destined to receive the event, 
tion of said traversing of the publisher matching tree, 55 wherein the ascertaining is approximate in that one multicast 
multicasting the event once to each cluster having at least group in each cluster comprises an "all broker" multicast 
one "1" in a corresponding row of a result mask obtained group, and said performing up to C multicasts comprises, for 
from said performing the "OR" operation, wherein multicast a multicast to a particular cluster, using the "all broker" 
addresses are determined by appending each row in the multicast group if the event is to be published to more than 
result mask to a corresponding cluster id (in binary), and a threshold number T of brokers in the cluster, wherein T< 
subsequent to said multicasting, traversing the subscriber a total number of brokers in the cluster. 

matching tree at a subscribing broker, and sending the event 18. The at least one program storage device of claim 13, 

to each subscribing client thereof. wherein said mapping comprises clustering aU brokers of the 

11. The method of claim 10, wherein said multicast 55 publish-subscribe system into C clusters, each cluster com- 
addresses are determined as follows: for cluster id Ci, if a prising a mutually exclusive subset of brokers of the 
corresponding row in the result mask has no more than a publish-subscribe system. 
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19. The at least one program storage device of claim 18, 
wherein said cliistering comprises choosing a broker of the 
publish-subscribe system that has not already been allocated 
to a cluster and building a cluster by using latency between 
brokers to group brokers within the cluster 5 

20. The at least one program storage device of claim 19, 
wherein said mapping further comprises for each cluster, 
assigning brokers to each individual broker subset derived 
from a total number of brokers in the cluster, wherein one 
multicast group is assigned to each such broker subset, and 
wherein the brokers in each subset join the multicast group , 
assigned to that subset. 

21. The at least one program storage device of claim 18, 
further comprising providing a publisher matching tree on 
each publishing broker of the publish-subscribe system, and 
providing a subsaiber matching tree on each subscribing 
broker of the publish-subscribe system, wherein the pub- 
lisher matching tree contains subscriptions annotated with a 
2D bit vector of C rows and K columns, wherein C equals 20 
a number of clusters, and K equals a number of brokers 
within the particular cluster, and wherein said subscriber 
matching tree comprises subscriptions annotated with con- 
sumer ids. 

22. The at least one program storage device of claim 21, 25 
wherein said using comprises at a publishing broker, creat- 
ing a 2 bit -vector "mask" of all zeros, wherein a number of 
rows of the "mask*' equals a number of clusters and a 
number of columns of the "mask" equals a number of 
brokers within a cluster, traversing the publisher matching 
tree, and for each leaf node visited, performing an "OR" 
operation between the "mask" and the 2D bit vector anno- 
tation at the leaf node, and upon completion of said travers- 
ing of the publisher matching tree, multicasting the event 
once to each cluster having at least one "1" in a correspond- 
ing row of a result mask obtained from said performing the 
"OR" operation, wherein multicast addresses are determined 
by appending each row in the result mask to a corresponding 
cluster id (in binary), and subsequent to said multicasting, 40 
traversing the subscriber matching tree at a subscribing 
broker, and sending the event to each subscribing client 
thereof. 

23. The at least one program storage device of claim 22, 
wherein said multicast addresses are determined as follows: 
for cluster id Ci, if a corresponding row in the result mask 
has no more than a threshold T of "r*s, its multicast address 
is Ci appended with the row's value; and for cluster id Ci, 

if a corresponding row in the result mask has more than the 50 
threshold T of "l"s, the muhicast address is a special 
"cluster broadcast" address. 

24. Hie at least one program storage device of claim 23, 
further comprising at the subscribing broker, discarding the 
event if there are no subscribers to the event. 

25. A system for implementing a content-based publish- 
subscribe system using a group-based multicast, said system 
comprising: 

means for mapping possible groups of the content-based 60 
publish-subscribe system to a smaller number of mul- 
ticast groups, wherein said smaller number of multicast 
groups comprise brokers, said brokers having consum- 
ers; 

means for using the smaller number of multicast groups to 65 
forward an event to interested consumers within the 
content-based publish-subscribe system; 
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wherein said means for mapping comprises means for 
clustering brokers of the publish-subscribe system into 
C clusters, wherein each cluster of said C clusters has 
its own subset of multicast groups, and wherein C>1; 
and 

wherein said means for using comprises means for mul- 
ticasting the event to interested consumers using the 
smaller number of groups, and wherein said means for 
multicasting comprises means for multicasting the 
event up to C times, each multicasting being to inter- 
ested consumers within a different cluster of said C 
clusters. 

26. The system of claim 25, wherein said means for 
clustering comprises means for grouping brokers within said 
C clusters using geographic proximity of brokers within said 
publish-subscribe system. 

27. The system of claim 25, wherein said means for using 
comprises: means for matching the event against all sub- 
scriptions of the publish-subscribe system; means for sorting 
a resulting list of brokers having subscriptions for the event 
by cluster; thereafter, means for ascertaining the multicast 
group in each cluster that contains those brokers destined to 
receive the event; and means for performing up to C 
multicasts of the event to those multicast groups of the C 
clusters, wherein each multicast goes to a different cluster of 
said C clusters. 

28. The system of claim 19, wherein said means for 
ascertaining the multicast group comprises means for ascer- 
taining the multicast group of each cluster that contains 
precisely those brokers destined to receive the event. 

29. The system of claim 19, wherein said means for 
ascertaining the multicast group comprises means for ascer- 
taining the multicast group of each cluster that approxi- 
mately contains those brokers destined to receive the event, 
wherein the means for ascertaining is approximate in that 
one multicast group in each cluster comprises an "all bro- 
ker** multicast group, and said means for performing up to 
C multicasts comprises, for a multicast to a particular 
cluster, means for using the "all broker" multicast group if 
the event is to be published to more than a threshold number 
T of brokers in the cluster, wherein T< a total number of 
brokers in the cluster. 

30. The system of claim 25, wherein said means for 
mapping comprises means for clustering all brokers of the 
publish-subscribe system into C clusters, each cluster com- 
prising a mutually exclusive subset of brokers of the 
publish-subscribe system. 

31. The system of claim 30, wherein said means for 
clustering comprises means for choosing a broker of the 
publish-subscribe system that has not already been allocated 
to a cluster and means for building a cluster by using latency 
between brokers to group brokers within the cluster. 

32. The system of claim 31, wherein said means for 
mapping further comprises for each cluster, means for 
assigning brokers to each individual broker subset derived 
from a total number of brokers in the cluster, wherein one 
multicast group is assigned to each such broker subset, and 
wherein the brokers in each subset join the multicast group 
assigned to that subset. 

33. The system of claim 30, further comprising means for 
providing a publisher matching tree on each publishing 
broker of the publish-subscribe system, and means for 
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providing a subscriber matching tree on each subscribing 
broker of the publish-subscribe system, wherein the pub- 
lisher matching tree contains subscriptions annotated with a 
2D bit vector of C rows and K columns, wherein C equals 
a number of clusters, and K equals a number of brokers 
within the particular cluster, and wherein said subscriber 
matching tree comprises subscriptions annotated with con- 
sumer ids. 

34. The system of claim 33, wherein said means for using 
comprises at a publishing broker, means for creating a 2 
bit-vector "mask" of all zeros, wherein a number of rows of 
the "mask" equals a number of clusters and a number of 
columns of the "mask** equals a number of brokers within a 
cluster, means for traversing the publisher matching tree, 
and for each leaf node visited, means for performing an 
"OR" operation between the "mask" and the 2D bit vector 
annotation at the leaf node, and upon completion of said 
traversing of the publisher matching tree, means for multi- 
casting the event once to each cluster having at least one "1" 
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in a corresponding row of a result mask obtained from said 
means for performing the "OR" operation, wherein multi- 
cast addresses are determined by appending each row in the 
result mask to a corresponding cluster id (in binary), and 
subsequent to said multicasting, means for traversing the 
subscriber matching tree at a subscribing broker, and send- 
ing the event to each subscribing client thereof. 

35. The system of claim 34, wherein said multicast 
10 addresses are determined as follows; for cluster id Ci, if a 

corresponding row in the result mask has no more than a 
threshold T of "r*s, its multicast address is Ci appended with 
the row's value; and for cluster id Ci, if a corresponding row 
in the result mask has more than the threshold T of "l"s, the 
multicast address is a special "cluster broadcast** address. 

36. The system of claim 35, further comprising at the 
subscribing broker, means for discarding the event if there 
are no subscribers to the event. 

* * * * * 
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